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Abstract 

Smearing the gauge links of dynamical configurations removes small scale unphysi- 
cal vacuum fluctuations und thus improves the chiral properties of lattice fermions. We 
present a new algorithm for the simulation of dynamical fermions coupled via smeared 
links based on the standard pure gauge overrelaxation and heatbath updatings. Smeared 
links play a fundamental role in making this algorithm effective. At fixed lattice spacing 
the computational cost of the algorithm has an extra volume factor due to the finite vol- 
ume of the lattice region which can be updated. As the continuum limit is approached 
the physical volume of the updated region remains constant. We simulated four flavors 
of staggered fermions coupled via hypercubic (HYP) smeared links. The simulation cost 
of the new algorithm on 10fm 4 volumes is a factor 2-8 larger than with the standard 
Hybrid Monte Carlo but the improved properties of the HYP action allow to gain a 
factor 2 in the lattice spacing. The new algorithm could be applicable to simulations of 
more complicated chiral fermionic actions, like overlap or perfect actions. 
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1 Introduction 



In this article we propose an algorithm for dynamical simulations of smeared link 
fermions. We consider a system described by the action 

S = S g (U)-trhx[Q\V)Q(V)] , (1) 

where S g (U) is the pure gauge action and QiV) is the fermionic matrix. The gauge 
connections between the fermions are smeared links V which are constructed determin- 
istically from the dynamical thin links U. Since each smeared link is a local combination 
of a finite number of thin links, the system where the fermions couple via smeared links 
is in the same universality class as the system with thin links. In the action we keep 
the smearing of the links unchanged as the continuum limit is approached. 

Smearing of the gauge links in the action and operators of a theory is part of 
many improvement programs. It has been demonstrated that smeared links improve 
flavor symmetry with staggered fermions and chiral symmetry with Wilson-type 

clover fermions §. They are useful in the construction of an overlap Dirac operator 
JUJIQ and arise naturally in the framework of perfect actions [§,|s|JlO|l . As an example 
for improved operators we mention the classically perfect Polyakov loop in pure gauge 
theory In order to eliminate lattice artifacts the static source has to become an 

extended object. Wilson loops constructed from smeared links were found to improve 



the statistical accuracy of potential measurements in [12,13| also. 

The problem with eq. (|l]) is that the standard simulation algorithms for dynamical 
fermions (Hybrid Monte Carlo (HMC) or R algorithms) involve the computation of 
the gauge force dQ(V)/dU , which is either very complicated or likely impossible if the 
smeared links are projected onto SU(3). Present large scale dynamical simulations with 
standard algorithms 0,^] use one level of non-projected smearing. Quenched studies 
indicate that it would be desirable to simulate dynamical fermions with smoother links, 
either obtained by a more complex and effective smearing procedure like the hypercubic 



(HYP) blocking 16] or by iterative smearing with projection onto SU(3) after each 
smearing step. 

2 The HYP action 

The definition of HYP links and their properties are discussed in |l6|Jl2^ ] . In the natural 
formulation of four flavors of staggered quarks, the continuum SU(4) flavor symmetry is 
broken to U(l) at finite lattice spacing. The pion spectrum has only one true Goldstone 
pion 7Tg, the other 14 pions remain massive when the quark mass approaches zero. The 
mass splitting between a non-Goldstone pion ir and ttq can be parametrized by the 
quantity pi 
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Figure l: Scaling behavior of different staggered actions. All data are extrapolated to 
ma/rrip = 0.52 except the Asqtad action with Symanzik improved gauge action which is 
at ma/nip = 0.538(2) and the dynamical thin link action which is at mc/iTip = 0.56(2). 



evaluated at a fixed mc/nip ratio. With a flavor symmetric action #2 = for all the 
pions. 

We computed #2 at ms/nip = 0.55 for the two lightest non-Goldstone pions, 71^5 
with flavor structure 7^75 and nij with flavor structure 7i7j, on a set of quenched 8 3 x 24 
lattices generated with Wilson pure gauge action at f3 = 5.7 (a = 0.17fm), using three 
different valence quark actions: standard staggered action with thin or HYP smeared 
links and improved Asqtad action of the MILC collaboration The results are shown 
in table [l]. Flavor symmetry violations with the HYP action are reduced by close to 
an order of magnitude with respect to the thin link action and by a factor of two with 
respect to the Asqtad action. 

Table l: Flavor symmetry breaking for different valence quark actions at lattice spacing 
a = 0.17fm. The parameter 82 is defined in eq. (|2[) and is evaluated at mQ/m p = 0.55. 



Action/#2 


Thin 


Asqtad 


HYP 




0.594(25) 


0.191(22) 


0.086(14) 




0.72(6) 


0.32(4) 


0.150(24) 



The HYP blocking was designed to improve flavor symmetry by reducing the fluc- 
tuations of the gauge links that couple to the different flavor and Dirac components 



2 



of staggered fermions inside a hypercube. The parameters of the HYP transformation 
have been optimized non-perturbatively to minimize the short distance fluctuations of 
the smeared gauge field flqj . A first non-trivial check of the HYP construction comes 
from perturbation theory: The cancellation of the flavor symmetry violating terms at 
tree level in the HYP staggered action gives consistent results for the HYP parameters 
1 17]. We looked further at the scaling properties of the HYP action. Figure |l] shows the 



rho mass as a function of the lattice spacing squared, all in units of ro ||18|| . In order to 
compare a result from a dynamical HYP simulation at gauge coupling /3 = 5.1 and bare 
quark mass am = 0.04, we chose to extrapolate the quenched data to mQ/m p = 0.52, a 
slightly different value than the one used in table |]. The quenched results correspond 
to (3 = 5.7 (ro/a = 2.96(2)) and j3 = 6.0 (r^/a = 5.36(4)) with the Wilson gauge action 
and to p = 8.0 (r /a = 3.687(13)) and (3 = 8.4 (r /a = 5.191(16)) with the Symanzik 



improved gauge action and Asqtad quark action [19]. At (3 = 6.0 we used for the thin 
link action data from ref. |2(|. The quenched points show clearly the good scaling 
properties of the HYP action. The available dynamical data are in agreement with the 
quenched ones. 

3 The New Algorithm 

We propose a two-step updating to simulate the system described by an action S as 
given in eq. ([l|) 

1) update a set of thin links {U} — > {U'} in a way that satisfies detailed balance for 

Sg(U) 

2) accept/reject the proposed change with acceptance probability 

qHv')q(V) 



det 

P acc = min ( 1, 



' det[Q^V)Q(V)] 
where V' are the smeared links constructed from the updated thin links. 



(3) 



Since the smeared links V are constructed deterministically form the thin links U, they 
are not dynamical variables and an ergodic update of the U links will correctly simulate 
the system. The proof that the above two-step updating satisfies detailed balance for 
the action S follows closely the proof given in [gl]] for a similar two-step decomposition 
of the action. 

In the practical implementation of the algorithm, our choice for S g is the Wilson 
plaquette action and the updating in step 1) is performed either with micro canonical 
overrelaxation p2,23| or with Cabibbo-Marinari heatbath [24|. We choose the subset of 



thin links to be updated differently for the overrelaxation and for the heatbath steps. 
For the overrelaxation we reflect all the links within some finite block of the lattice. The 
location of the block is chosen randomly but its dimensions are fixed. We choose with 
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probability 1/2 a given sequence of reflections within this block or with equal probability 
the reversed sequence. The sequence has to be reversed with respect to the direction 
and location of the thin links and with respect to the index of the SU(2) subgroup used 
in the reflection step. For the heatbath we choose the subset of thin links randomly, 
i.e. we choose a random direction, a random parity, a set of random sites and a random 
sequence of SU(2) subgroups. The probability of generating a given sequence of thin 
link updates or the reversed sequence is then the same, both for overrelaxation and 
heatbath. From this it follows that detailed balance with respect to the pure gauge 
action S g is satisfied for both updatings. We chose to update a contiguous block of links 
with overrelaxation in order to propagate a change through (a portion of) the lattice. 
We observed that a random sequence of overrelaxed reflections is less efficient, especially 
for changing the topological charge of the configurations. This can be understood by 
thinking of instantons as extended objects. One has to change an entire region of the 
lattice to destroy or to create them. 

In step 2) we use a stochastic estimator to evaluate the ratio of fermionic determi- 
nants 

P' &cc = min{l,expA5} (4) 
AS = tf[QHV)Q(y')-Qi(y)Q(V)]t, (5) 

where the vector £ is generated according to the probability distribution 

P(0cxexp{-^Q\V')Q(V')£}. (6) 

The key feature is that smeared links constrain the statistical fluctuations of the stochas- 
tic estimator and make the algorithm effective, as we will show in the next section for 
the case of HYP staggered fermions. 

Yet another interesting question is about the actual value of the ratio of fermionic 
determinants in eq. (||). This value is the natural limit on the performance of the 
algorithm, something like the Carnot efficiency of an heat engine. No matter how 
the accept/reject step 2) is performed, the detailed balance condition implies that the 
maximal acceptance is given by the Metropolis acceptance eq. (||) Q. We use a stochastic 
estimator to compute the acceptance and this reduces in principle the efficiency of the 
algorithm. The actual value of the ratio of fermionic determinants depends on the 
change in the gauge configuration. A larger number of changed thin links U leads to a 
smaller value of the determinant ratio. 

4 Four flavors of HYP staggered fermions 

In the following we describe the application of the new algorithm to the case of n/ =4 
flavors of staggered fermions coupled via hypercubic (HYP) smeared links. The matrix 

1 We thank U. Wolff for emphasizing this point. 
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Q in eq. (H) is given by 

Qi,j(V) = 2{am)5 ijj + D^V) , (7) 



If we take the matrix Q^Q to be defined on the even sites of the lattice only pq| , eq. (Q) 
describes four flavors of staggered fermions coupled via HYP smeared links V (rji } a are 
the staggered phases). To further reduce the fluctuations of the stochastic estimator in 
the accept /reject step eq. (0), we remove from the fermion matrix its most ultraviolet 
part by decomposing it as 

q(v) = Q r (y)A(y) , (8) 

A(V) = exp[a 4 L> 4 (F) + a 2 D 2 (V)} . (9) 

This way we achieve that an effective action 

5 eff = -2a 4 Re trD 4 - 2a 2 Re trD 2 (10) 

is removed from the fermionic determinant. The acceptance probability in eq. (|j) 
becomes 

P acc = min{l,exp[5 cff (y)-5 eff (y')] expAS r } (11) 

AS r = HtlQt(V')Q r (V f )-Qi(V)Q r (VM r , (12) 

where the vector £ r is generated according to the probability distribution 

Pfo.) a exp{-ClQt(V')Q r (V%} . (13) 

In practice, we start by generating a random Gaussian vector R from which the vectors 

= Q\V')R, (14) 
X' = [Q\V)Q(V')}- 1 <5>' . (15) 

are constructed. The vector £ r in eq. fli~3| ) is then given by ^ r = A(V')X' . The two 
terms in eq. (|i~2|) are calculated with the formulae 

£Qt(v')Q r (y% = ^x', (i6) 
£Qi(v)Q r (v)£ r = x / tA(y')^(^)" 1 Q t (v)Q(y)A(y)- 1 ^(F')^ / - (17) 



All the vectors X', £ r and the traces in eq. ( |i0[ ) are defined on the even sites of the 
lattice only. 

The real parameters a 4 and «2 can be optimized to maximize P aC cj eq. (11). We 
keep the choice a 4 = —0.006, a.2 = —0.18 from our previous work Q where the values 
were found to be optimal for an action with smeared links obtained by three levels of 
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ordinary APE blocking |26|| . The computation of the matrix A in eq. @ requires the 
evaluation of the exponential function which we do by truncating the exponential series 
after 15 terms. 

In Monte Carlo simulations we separate the measurements of the observables by 
Aor overrelaxation and Arb heatbath two-step updatings, changing £or and £hb links 
respectively. On 8 3 x 24 lattices we simulated the HYP action with gauge coupling 
(3 = 5.2 and bare quark mass am = 0.1. An approximate matching of the lattice 
spacing (a ~ 0.17 fm) and of the Goldstone pion mass (mc^o = 2.0) can be achieved by 
simulating the thin link action with parameters (3 = 5.2 and am = 0.06. In an attempt 
to make a reliable comparison of the computational cost of the two simulations, we give 
in the last column of table ^ estimates for the number of Q^Q multiplications required 
to get an independent configuration. These estimates are obtained by considering the 
autocorrelation time T m t{p) for the plaquette. By independent configurations we mean 
configurations obtained every 2ri nt (p) updating steps. 



Table 2: Comparison of the computational cost for creating an independent configura- 
tion on a 10 fm 4 lattice. 



Algorithm 


2r int 


Q+Q/CG 


Q^Q/updating 


upds./conf. 


QtQ/indep. conf. 


thin HMC 


24(9) 


135 


67 x 135 « 9000 


1 


0.22(8) x 10 6 


HYP 


30(14) 


77 


77+60+(33)=170 


240 


1.2(6) x 10 6 



We found that measurements with the HYP action separated by Aor = 160 over- 
relaxation and Ahb = 80 heatbath updatings have T m t(p) = 15(7). Each overrelaxation 
updating changes ioR = 128 and each heatbath updating changes tuB = 200 links with 
an acceptance of ~ 20%, corresponding to an effective change of 15% of all the links 
between measurements. Each updating, with overrelaxation or heatbath, requires one 
inversion of Qi (V)Q(V), see eq. (]I5|), performed with one conjugate gradient iteration 
(CG). Each CG iteration requires with these parameters on average 77 multiplications 
by (V)Q(V). In addition, each updating requires twice the computation of the matrix 
A(V), see eq. (|l~7|), for a total of 60 multiplications by Q' (V)Q(V). The recomputation 
of the HYP links once 200 dynamical thin links are changed has a cost on this lattice 
volume comparable to 33 multiplications by Q j[ (V)Q(V). All together, we need 170 
multiplications by Q\V)Q{V) for each updating. 

We simulated the thin link action with HMC. We found that measurements sepa- 
rated by one HMC trajectory of unit time length have T; n t(p) = 12(5). One trajectory 
requires about 67 CG iterations, each of which needs with the given parameters on 
average 135 multiplications by Q\U)Q{U). 

In summary, an independent configuration for the HYP simulation costs 1.2(6) x 10 6 
multiplications by (V)Q(V) and for the thin link simulation 0.22(8) x 10 6 multipli- 
cations by Q*(U)Q{U). This gives a factor 5(3) for the HYP computational cost over 
thin link HMC. To achieve the same improvement of flavor symmetry and scaling of 
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the HYP action, the lattice spacing of thin link action simulations should be reduced 
at least by a factor 2. The overall gain of simulations with the HYP action is evident. 

A drawback of our algorithm is that as the lattice volume increases the numbers of 
the updated links toR and £hb have to be kept unchanged, consequently the numbers of 
updatings Aor and Ahb have to be scaled with the volume to keep the autocorrelation 
times unchanged. Since each updating requires the evaluation of the fermionic action 
this gives a volume square dependence for the cost of the algorithm at fixed lattice 
spacing. On the other hand as the continuum limit is approached the numbers of links 
tOR and £hb which can be effectively updated scale: the physical volume of the updated 
region is constant. Considering our simulation of the HYP action at f3 = 5.2 and am = 
0.1, we can update a physical volume 0.027(1) fm 4 with one overrelaxation updating 
and 0.042(2) fm 4 with one heatbath updating with 20% acceptance rate. Simulating 
the HYP action at (3 = 5.0 and am = 0.1, with a lattice spacing which is 30-40% larger, 
we can update a physical volume 0.034(6) fm 4 with one overrelaxation updating and 
0.056(10) fm 4 with one heatbath updating with 20% acceptance rate. 

To improve the efficiency of the HYP update one can improve the stochastic esti- 
mator needed in the acceptance step, adding a D 6 term in the definition of the matrix 
A eq. ([]). Furthermore, the pure gauge action used in the heatbath and overrelaxation 
updatings can be changed to make a better proposal for the acceptance step and the 
convergence of the exponential series calculation for the matrix A can be made faster. 
Some of these improvements are being discussed in a forthcoming publication 27]. 



5 Conclusions 



We presented a new simulation algorithm applicable to QCD actions where the fermions 
are coupled via smeared links. The algorithm is based on the standard overrelaxation 
and heatbath algorithms. We used it to simulate four flavors of staggered fermions with 
HYP smeared links. The cost of the algorithm has a volume square dependence at fixed 
lattice spacing due to the finite volume of the lattice region that can be updated, but 
the physical volume of the updated region remains constant as the continuum limit is 
approached. A comparison with the time cost of standard staggered action simulations 
with Hybrid Monte Carlo shows that the new algorithm is a factor 2-8 slower on lattices 
of physical volume of about 10 fm 4 at m^ro = 2.0. The improvement in flavor symmetry 
and scaling of the HYP action allows to gain a factor 2 in the lattice spacing a. Given 
that the present cost of QCD simulations grows at least as 1/a 6 there is a consistent 
improvement with the new algorithm. 

An interesting point is that the algorithm could be used to simulate chiral fermionic 
actions like overlap or perfect actions. The presence of smeared links is the key feature 
which makes the algorithm efficient. 
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